Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 150000 |
| Missing cells | 33655 |
| Missing cells (%) | 1.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 13.7 MiB |
| Average record size in memory | 96.0 B |
Variable types
| NUM | 11 |
|---|---|
| BOOL | 1 |
Reproduction
| Analysis started | 2020-08-11 02:00:25.415873 |
|---|---|
| Analysis finished | 2020-08-11 02:01:17.018433 |
| Duration | 51.6 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
NumberOfTimes90DaysLate is highly correlated with NumberOfTime30-59DaysPastDueNotWorse and 1 other fields | High correlation |
NumberOfTime30-59DaysPastDueNotWorse is highly correlated with NumberOfTimes90DaysLate and 1 other fields | High correlation |
NumberOfTime60-89DaysPastDueNotWorse is highly correlated with NumberOfTime30-59DaysPastDueNotWorse and 1 other fields | High correlation |
MonthlyIncome has 29731 (19.8%) missing values | Missing |
NumberOfDependents has 3924 (2.6%) missing values | Missing |
RevolvingUtilizationOfUnsecuredLines is highly skewed (γ1 = 97.63157449) | Skewed |
NumberOfTime30-59DaysPastDueNotWorse is highly skewed (γ1 = 22.59710756) | Skewed |
DebtRatio is highly skewed (γ1 = 95.15779287) | Skewed |
MonthlyIncome is highly skewed (γ1 = 114.0403179) | Skewed |
NumberOfTimes90DaysLate is highly skewed (γ1 = 23.08734547) | Skewed |
NumberOfTime60-89DaysPastDueNotWorse is highly skewed (γ1 = 23.33174312) | Skewed |
Unnamed: 0 has unique values | Unique |
RevolvingUtilizationOfUnsecuredLines has 10878 (7.3%) zeros | Zeros |
NumberOfTime30-59DaysPastDueNotWorse has 126018 (84.0%) zeros | Zeros |
DebtRatio has 4113 (2.7%) zeros | Zeros |
MonthlyIncome has 1634 (1.1%) zeros | Zeros |
NumberOfOpenCreditLinesAndLoans has 1888 (1.3%) zeros | Zeros |
NumberOfTimes90DaysLate has 141662 (94.4%) zeros | Zeros |
NumberRealEstateLoansOrLines has 56188 (37.5%) zeros | Zeros |
NumberOfTime60-89DaysPastDueNotWorse has 142396 (94.9%) zeros | Zeros |
NumberOfDependents has 86902 (57.9%) zeros | Zeros |
| Distinct count | 150000 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 75000.5 |
|---|---|
| Minimum | 1 |
| Maximum | 150000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 7500.95 |
| Q1 | 37500.75 |
| median | 75000.5 |
| Q3 | 112500.25 |
| 95-th percentile | 142500.05 |
| Maximum | 150000 |
| Range | 149999 |
| Interquartile range (IQR) | 74999.5 |
Descriptive statistics
| Standard deviation | 43301.41453 |
|---|---|
| Coefficient of variation (CV) | 0.5773483447 |
| Kurtosis | -1.2 |
| Mean | 75000.5 |
| Median Absolute Deviation (MAD) | 37500 |
| Skewness | 0 |
| Sum | 1.1250075e+10 |
| Variance | 1875012500 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 107806 | 1 | < 0.1% | |
| 9518 | 1 | < 0.1% | |
| 15661 | 1 | < 0.1% | |
| 13612 | 1 | < 0.1% | |
| 3371 | 1 | < 0.1% | |
| 1322 | 1 | < 0.1% | |
| 7465 | 1 | < 0.1% | |
| 5416 | 1 | < 0.1% | |
| 27943 | 1 | < 0.1% | |
| Other values (149990) | 149990 | > 99.9% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 150000 | 1 | < 0.1% | |
| 149999 | 1 | < 0.1% | |
| 149998 | 1 | < 0.1% | |
| 149997 | 1 | < 0.1% | |
| 149996 | 1 | < 0.1% |
SeriousDlqin2yrs
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 MiB |
| 0 | |
|---|---|
| 1 | 10026 |
| Value | Count | Frequency (%) | |
| 0 | 139974 | 93.3% | |
| 1 | 10026 | 6.7% |
| Distinct count | 125728 |
|---|---|
| Unique (%) | 83.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.048438054666888 |
|---|---|
| Minimum | 0.0 |
| Maximum | 50708.0 |
| Zeros | 10878 |
| Zeros (%) | 7.3% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.029867442 |
| median | 0.154180737 |
| Q3 | 0.5590462475 |
| 95-th percentile | 0.9999999 |
| Maximum | 50708 |
| Range | 50708 |
| Interquartile range (IQR) | 0.5291788055 |
Descriptive statistics
| Standard deviation | 249.7553706 |
|---|---|
| Coefficient of variation (CV) | 41.29254005 |
| Kurtosis | 14544.71341 |
| Mean | 6.048438055 |
| Median Absolute Deviation (MAD) | 0.148325347 |
| Skewness | 97.63157449 |
| Sum | 907265.7082 |
| Variance | 62377.74516 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 10878 | 7.3% | |
| 0.9999999 | 10256 | 6.8% | |
| 1 | 17 | < 0.1% | |
| 0.9500998 | 8 | < 0.1% | |
| 0.71314741 | 6 | < 0.1% | |
| 0.007984032 | 6 | < 0.1% | |
| 0.954091816 | 6 | < 0.1% | |
| 0.796407186 | 5 | < 0.1% | |
| 0.850299401 | 5 | < 0.1% | |
| 0.538922156 | 5 | < 0.1% | |
| Other values (125718) | 128808 | 85.9% |
| Value | Count | Frequency (%) | |
| 0 | 10878 | 7.3% | |
| 8.37e-06 | 1 | < 0.1% | |
| 9.93e-06 | 1 | < 0.1% | |
| 1.25e-05 | 1 | < 0.1% | |
| 1.43e-05 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 50708 | 1 | < 0.1% | |
| 29110 | 1 | < 0.1% | |
| 22198 | 1 | < 0.1% | |
| 22000 | 1 | < 0.1% | |
| 20514 | 1 | < 0.1% |
age
Real number (ℝ≥0)
| Distinct count | 86 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 52.295206666666665 |
|---|---|
| Minimum | 0 |
| Maximum | 109 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 29 |
| Q1 | 41 |
| median | 52 |
| Q3 | 63 |
| 95-th percentile | 78 |
| Maximum | 109 |
| Range | 109 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 14.77186586 |
|---|---|
| Coefficient of variation (CV) | 0.2824707426 |
| Kurtosis | -0.4946688326 |
| Mean | 52.29520667 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.1889945451 |
| Sum | 7844281 |
| Variance | 218.2080211 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 49 | 3837 | 2.6% | |
| 48 | 3806 | 2.5% | |
| 50 | 3753 | 2.5% | |
| 63 | 3719 | 2.5% | |
| 47 | 3719 | 2.5% | |
| 46 | 3714 | 2.5% | |
| 53 | 3648 | 2.4% | |
| 51 | 3627 | 2.4% | |
| 52 | 3609 | 2.4% | |
| 56 | 3589 | 2.4% | |
| Other values (76) | 112979 | 75.3% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 21 | 183 | 0.1% | |
| 22 | 434 | 0.3% | |
| 23 | 641 | 0.4% | |
| 24 | 816 | 0.5% |
| Value | Count | Frequency (%) | |
| 109 | 2 | < 0.1% | |
| 107 | 1 | < 0.1% | |
| 105 | 1 | < 0.1% | |
| 103 | 3 | < 0.1% | |
| 102 | 3 | < 0.1% |
| Distinct count | 16 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4210333333333333 |
|---|---|
| Minimum | 0 |
| Maximum | 98 |
| Zeros | 126018 |
| Zeros (%) | 84.0% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 98 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 4.192781272 |
|---|---|
| Coefficient of variation (CV) | 9.958311944 |
| Kurtosis | 522.3765449 |
| Mean | 0.4210333333 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 22.59710756 |
| Sum | 63155 |
| Variance | 17.57941479 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 126018 | 84.0% | |
| 1 | 16033 | 10.7% | |
| 2 | 4598 | 3.1% | |
| 3 | 1754 | 1.2% | |
| 4 | 747 | 0.5% | |
| 5 | 342 | 0.2% | |
| 98 | 264 | 0.2% | |
| 6 | 140 | 0.1% | |
| 7 | 54 | < 0.1% | |
| 8 | 25 | < 0.1% | |
| Other values (6) | 25 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 126018 | 84.0% | |
| 1 | 16033 | 10.7% | |
| 2 | 4598 | 3.1% | |
| 3 | 1754 | 1.2% | |
| 4 | 747 | 0.5% |
| Value | Count | Frequency (%) | |
| 98 | 264 | 0.2% | |
| 96 | 5 | < 0.1% | |
| 13 | 1 | < 0.1% | |
| 12 | 2 | < 0.1% | |
| 11 | 1 | < 0.1% |
| Distinct count | 114194 |
|---|---|
| Unique (%) | 76.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 353.00507576386985 |
|---|---|
| Minimum | 0.0 |
| Maximum | 329664.0 |
| Zeros | 4113 |
| Zeros (%) | 2.7% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.004329004 |
| Q1 | 0.1750738323 |
| median | 0.366507841 |
| Q3 | 0.8682537732 |
| 95-th percentile | 2449 |
| Maximum | 329664 |
| Range | 329664 |
| Interquartile range (IQR) | 0.693179941 |
Descriptive statistics
| Standard deviation | 2037.818523 |
|---|---|
| Coefficient of variation (CV) | 5.772774 |
| Kurtosis | 13734.28886 |
| Mean | 353.0050758 |
| Median Absolute Deviation (MAD) | 0.2457227975 |
| Skewness | 95.15779287 |
| Sum | 52950761.36 |
| Variance | 4152704.333 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 4113 | 2.7% | |
| 1 | 229 | 0.2% | |
| 4 | 174 | 0.1% | |
| 2 | 170 | 0.1% | |
| 3 | 162 | 0.1% | |
| 5 | 143 | 0.1% | |
| 9 | 125 | 0.1% | |
| 10 | 117 | 0.1% | |
| 7 | 115 | 0.1% | |
| 13 | 114 | 0.1% | |
| Other values (114184) | 144538 | 96.4% |
| Value | Count | Frequency (%) | |
| 0 | 4113 | 2.7% | |
| 2.6e-05 | 1 | < 0.1% | |
| 3.69e-05 | 1 | < 0.1% | |
| 3.93e-05 | 1 | < 0.1% | |
| 6.62e-05 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 329664 | 1 | < 0.1% | |
| 326442 | 1 | < 0.1% | |
| 307001 | 1 | < 0.1% | |
| 220516 | 1 | < 0.1% | |
| 168835 | 1 | < 0.1% |
| Distinct count | 13594 |
|---|---|
| Unique (%) | 11.3% |
| Missing | 29731 |
| Missing (%) | 19.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6670.221237392844 |
|---|---|
| Minimum | 0.0 |
| Maximum | 3008750.0 |
| Zeros | 1634 |
| Zeros (%) | 1.1% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1300 |
| Q1 | 3400 |
| median | 5400 |
| Q3 | 8249 |
| 95-th percentile | 14587.6 |
| Maximum | 3008750 |
| Range | 3008750 |
| Interquartile range (IQR) | 4849 |
Descriptive statistics
| Standard deviation | 14384.67422 |
|---|---|
| Coefficient of variation (CV) | 2.15655129 |
| Kurtosis | 19504.7054 |
| Mean | 6670.221237 |
| Median Absolute Deviation (MAD) | 2317 |
| Skewness | 114.0403179 |
| Sum | 802220838 |
| Variance | 206918852.3 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 5000 | 2757 | 1.8% | |
| 4000 | 2106 | 1.4% | |
| 6000 | 1934 | 1.3% | |
| 3000 | 1758 | 1.2% | |
| 0 | 1634 | 1.1% | |
| 2500 | 1551 | 1.0% | |
| 10000 | 1466 | 1.0% | |
| 3500 | 1360 | 0.9% | |
| 4500 | 1226 | 0.8% | |
| 7000 | 1223 | 0.8% | |
| Other values (13584) | 103254 | 68.8% | |
| (Missing) | 29731 | 19.8% |
| Value | Count | Frequency (%) | |
| 0 | 1634 | 1.1% | |
| 1 | 605 | 0.4% | |
| 2 | 6 | < 0.1% | |
| 4 | 2 | < 0.1% | |
| 5 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 3008750 | 1 | < 0.1% | |
| 1794060 | 1 | < 0.1% | |
| 1560100 | 1 | < 0.1% | |
| 1072500 | 1 | < 0.1% | |
| 835040 | 1 | < 0.1% |
| Distinct count | 58 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.45276 |
|---|---|
| Minimum | 0 |
| Maximum | 58 |
| Zeros | 1888 |
| Zeros (%) | 1.3% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 8 |
| Q3 | 11 |
| 95-th percentile | 18 |
| Maximum | 58 |
| Range | 58 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 5.14595099 |
|---|---|
| Coefficient of variation (CV) | 0.6087894356 |
| Kurtosis | 3.091066746 |
| Mean | 8.45276 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.21531378 |
| Sum | 1267914 |
| Variance | 26.48081159 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 6 | 13614 | 9.1% | |
| 7 | 13245 | 8.8% | |
| 5 | 12931 | 8.6% | |
| 8 | 12562 | 8.4% | |
| 4 | 11609 | 7.7% | |
| 9 | 11355 | 7.6% | |
| 10 | 9624 | 6.4% | |
| 3 | 9058 | 6.0% | |
| 11 | 8321 | 5.5% | |
| 12 | 7005 | 4.7% | |
| Other values (48) | 40676 | 27.1% |
| Value | Count | Frequency (%) | |
| 0 | 1888 | 1.3% | |
| 1 | 4438 | 3.0% | |
| 2 | 6666 | 4.4% | |
| 3 | 9058 | 6.0% | |
| 4 | 11609 | 7.7% |
| Value | Count | Frequency (%) | |
| 58 | 1 | < 0.1% | |
| 57 | 2 | < 0.1% | |
| 56 | 2 | < 0.1% | |
| 54 | 4 | < 0.1% | |
| 53 | 1 | < 0.1% |
| Distinct count | 19 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.26597333333333334 |
|---|---|
| Minimum | 0 |
| Maximum | 98 |
| Zeros | 141662 |
| Zeros (%) | 94.4% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 98 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 4.169303788 |
|---|---|
| Coefficient of variation (CV) | 15.67564588 |
| Kurtosis | 537.7389446 |
| Mean | 0.2659733333 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 23.08734547 |
| Sum | 39896 |
| Variance | 17.38309407 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 141662 | 94.4% | |
| 1 | 5243 | 3.5% | |
| 2 | 1555 | 1.0% | |
| 3 | 667 | 0.4% | |
| 4 | 291 | 0.2% | |
| 98 | 264 | 0.2% | |
| 5 | 131 | 0.1% | |
| 6 | 80 | 0.1% | |
| 7 | 38 | < 0.1% | |
| 8 | 21 | < 0.1% | |
| Other values (9) | 48 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 141662 | 94.4% | |
| 1 | 5243 | 3.5% | |
| 2 | 1555 | 1.0% | |
| 3 | 667 | 0.4% | |
| 4 | 291 | 0.2% |
| Value | Count | Frequency (%) | |
| 98 | 264 | 0.2% | |
| 96 | 5 | < 0.1% | |
| 17 | 1 | < 0.1% | |
| 15 | 2 | < 0.1% | |
| 14 | 2 | < 0.1% |
| Distinct count | 28 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.01824 |
|---|---|
| Minimum | 0 |
| Maximum | 54 |
| Zeros | 56188 |
| Zeros (%) | 37.5% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 54 |
| Range | 54 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.129770985 |
|---|---|
| Coefficient of variation (CV) | 1.109533101 |
| Kurtosis | 60.47680765 |
| Mean | 1.01824 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 3.482483994 |
| Sum | 152736 |
| Variance | 1.276382478 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 56188 | 37.5% | |
| 1 | 52338 | 34.9% | |
| 2 | 31522 | 21.0% | |
| 3 | 6300 | 4.2% | |
| 4 | 2170 | 1.4% | |
| 5 | 689 | 0.5% | |
| 6 | 320 | 0.2% | |
| 7 | 171 | 0.1% | |
| 8 | 93 | 0.1% | |
| 9 | 78 | 0.1% | |
| Other values (18) | 131 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 56188 | 37.5% | |
| 1 | 52338 | 34.9% | |
| 2 | 31522 | 21.0% | |
| 3 | 6300 | 4.2% | |
| 4 | 2170 | 1.4% |
| Value | Count | Frequency (%) | |
| 54 | 1 | < 0.1% | |
| 32 | 1 | < 0.1% | |
| 29 | 1 | < 0.1% | |
| 26 | 1 | < 0.1% | |
| 25 | 3 | < 0.1% |
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.24038666666666667 |
|---|---|
| Minimum | 0 |
| Maximum | 98 |
| Zeros | 142396 |
| Zeros (%) | 94.9% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 98 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 4.155179421 |
|---|---|
| Coefficient of variation (CV) | 17.28539889 |
| Kurtosis | 545.6827435 |
| Mean | 0.2403866667 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 23.33174312 |
| Sum | 36058 |
| Variance | 17.26551602 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 142396 | 94.9% | |
| 1 | 5731 | 3.8% | |
| 2 | 1118 | 0.7% | |
| 3 | 318 | 0.2% | |
| 98 | 264 | 0.2% | |
| 4 | 105 | 0.1% | |
| 5 | 34 | < 0.1% | |
| 6 | 16 | < 0.1% | |
| 7 | 9 | < 0.1% | |
| 96 | 5 | < 0.1% | |
| Other values (3) | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 142396 | 94.9% | |
| 1 | 5731 | 3.8% | |
| 2 | 1118 | 0.7% | |
| 3 | 318 | 0.2% | |
| 4 | 105 | 0.1% |
| Value | Count | Frequency (%) | |
| 98 | 264 | 0.2% | |
| 96 | 5 | < 0.1% | |
| 11 | 1 | < 0.1% | |
| 9 | 1 | < 0.1% | |
| 8 | 2 | < 0.1% |
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 3924 |
| Missing (%) | 2.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.7572222678605657 |
|---|---|
| Minimum | 0.0 |
| Maximum | 20.0 |
| Zeros | 86902 |
| Zeros (%) | 57.9% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 20 |
| Range | 20 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.115086071 |
|---|---|
| Coefficient of variation (CV) | 1.472600739 |
| Kurtosis | 3.001656811 |
| Mean | 0.7572222679 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.588242379 |
| Sum | 110612 |
| Variance | 1.243416947 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 86902 | 57.9% | |
| 1 | 26316 | 17.5% | |
| 2 | 19522 | 13.0% | |
| 3 | 9483 | 6.3% | |
| 4 | 2862 | 1.9% | |
| 5 | 746 | 0.5% | |
| 6 | 158 | 0.1% | |
| 7 | 51 | < 0.1% | |
| 8 | 24 | < 0.1% | |
| 9 | 5 | < 0.1% | |
| Other values (3) | 7 | < 0.1% | |
| (Missing) | 3924 | 2.6% |
| Value | Count | Frequency (%) | |
| 0 | 86902 | 57.9% | |
| 1 | 26316 | 17.5% | |
| 2 | 19522 | 13.0% | |
| 3 | 9483 | 6.3% | |
| 4 | 2862 | 1.9% |
| Value | Count | Frequency (%) | |
| 20 | 1 | < 0.1% | |
| 13 | 1 | < 0.1% | |
| 10 | 5 | < 0.1% | |
| 9 | 5 | < 0.1% | |
| 8 | 24 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Unnamed: 0 | SeriousDlqin2yrs | RevolvingUtilizationOfUnsecuredLines | age | NumberOfTime30-59DaysPastDueNotWorse | DebtRatio | MonthlyIncome | NumberOfOpenCreditLinesAndLoans | NumberOfTimes90DaysLate | NumberRealEstateLoansOrLines | NumberOfTime60-89DaysPastDueNotWorse | NumberOfDependents | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 0.766127 | 45 | 2 | 0.802982 | 9120.0 | 13 | 0 | 6 | 0 | 2.0 |
| 1 | 2 | 0 | 0.957151 | 40 | 0 | 0.121876 | 2600.0 | 4 | 0 | 0 | 0 | 1.0 |
| 2 | 3 | 0 | 0.658180 | 38 | 1 | 0.085113 | 3042.0 | 2 | 1 | 0 | 0 | 0.0 |
| 3 | 4 | 0 | 0.233810 | 30 | 0 | 0.036050 | 3300.0 | 5 | 0 | 0 | 0 | 0.0 |
| 4 | 5 | 0 | 0.907239 | 49 | 1 | 0.024926 | 63588.0 | 7 | 0 | 1 | 0 | 0.0 |
| 5 | 6 | 0 | 0.213179 | 74 | 0 | 0.375607 | 3500.0 | 3 | 0 | 1 | 0 | 1.0 |
| 6 | 7 | 0 | 0.305682 | 57 | 0 | 5710.000000 | NaN | 8 | 0 | 3 | 0 | 0.0 |
| 7 | 8 | 0 | 0.754464 | 39 | 0 | 0.209940 | 3500.0 | 8 | 0 | 0 | 0 | 0.0 |
| 8 | 9 | 0 | 0.116951 | 27 | 0 | 46.000000 | NaN | 2 | 0 | 0 | 0 | NaN |
| 9 | 10 | 0 | 0.189169 | 57 | 0 | 0.606291 | 23684.0 | 9 | 0 | 4 | 0 | 2.0 |
Last rows
| Unnamed: 0 | SeriousDlqin2yrs | RevolvingUtilizationOfUnsecuredLines | age | NumberOfTime30-59DaysPastDueNotWorse | DebtRatio | MonthlyIncome | NumberOfOpenCreditLinesAndLoans | NumberOfTimes90DaysLate | NumberRealEstateLoansOrLines | NumberOfTime60-89DaysPastDueNotWorse | NumberOfDependents | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 149990 | 149991 | 0 | 0.055518 | 46 | 0 | 0.609779 | 4335.0 | 7 | 0 | 1 | 0 | 2.0 |
| 149991 | 149992 | 0 | 0.104112 | 59 | 0 | 0.477658 | 10316.0 | 10 | 0 | 2 | 0 | 0.0 |
| 149992 | 149993 | 0 | 0.871976 | 50 | 0 | 4132.000000 | NaN | 11 | 0 | 1 | 0 | 3.0 |
| 149993 | 149994 | 0 | 1.000000 | 22 | 0 | 0.000000 | 820.0 | 1 | 0 | 0 | 0 | 0.0 |
| 149994 | 149995 | 0 | 0.385742 | 50 | 0 | 0.404293 | 3400.0 | 7 | 0 | 0 | 0 | 0.0 |
| 149995 | 149996 | 0 | 0.040674 | 74 | 0 | 0.225131 | 2100.0 | 4 | 0 | 1 | 0 | 0.0 |
| 149996 | 149997 | 0 | 0.299745 | 44 | 0 | 0.716562 | 5584.0 | 4 | 0 | 1 | 0 | 2.0 |
| 149997 | 149998 | 0 | 0.246044 | 58 | 0 | 3870.000000 | NaN | 18 | 0 | 1 | 0 | 0.0 |
| 149998 | 149999 | 0 | 0.000000 | 30 | 0 | 0.000000 | 5716.0 | 4 | 0 | 0 | 0 | 0.0 |
| 149999 | 150000 | 0 | 0.850283 | 64 | 0 | 0.249908 | 8158.0 | 8 | 0 | 2 | 0 | 0.0 |